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'.in:  empirical  and  rational  approa Oi-rn 
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INTRCLUOTION 


Pur none 


'I:.-  cu  rs  • of  this  study  is  to  compare  t.vn  approaches  of  key '.up  a 
b i . •■to ; 1 inventory.  One  approach  consists  of  the  empirical  : rivutlcr. 

of  3 cr.  x!  ur.ai  criteria.  Thio  is  the  well-known  technique  of  3olect- 
ir.,-'  fv;r.  a p o'  of  items  those  which  yield  a maximum  correlation  wit;:  a. 
cr  1 '.ec i;n.  -:.e  o* approach  conainto  of  the  development  of  homogeneous 
n.r.;i  r<-  :«t  iv->  Jy  ir.de re r.cer.t  keys.  These  keys  will  show  high  internal  c or. - 
a ;o  t.-r.cy,  or..:  the  item  selection  will  not  depend  upon  a relationship  with 
-x  terra  1 criterion. 


After  r.cmcfy  r.eous  keys  are  validated  on  the  3 amt  external  criteria 
with  which  the  empirical  keys  were  developed,  both  keys  are  to  be  oval'r.ted 


by  me  a 
to  be 
Air  rV 


of  rresn-validaticn  on  a new  nanple.  The  biographical 


yo.i  is  ;• v< 

• .and  in  known 


experimentally  at  present  to  officer  candidates 


V •• 
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•c  cede  number,  CF.6chC.  This  study  has 
t-ions  for  the  construction,  analysis,  and  use  of  such,  tests  as  they 
In  educational,  vocational,  and  personality  guidance. 
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Historical  Background 


Empirical  "<-.v  Ir.g 


Twenty- three  empirical  methods  have  been  described  by  Long  find  Gandi- 
ferd  ( lS } . vTulllkaen  ( 1 1. , p.  50*)  notes  that  of  these,  nearly  all  over- 
look the  theoretical  aspects  of  the  reliability  and  validity  of  the  total 
test  with  a few  notable  exceptions  (l,  15,  20,  22,  2L ) . 

Apart  from  these  methods  which  were  listed  in  the  Long  and  .Tar.riifrrd 
survey,  others  have  3ince  been  proposed.  Several  of  these  are  presented 
as  representative  procedures  of  empirical  keying. 

Horst  (12)  has  devised  a method  w.nich  involves  the  computation  of  the 
mean  criterion  score  and  the  mean  total  test  score  of  all  subjects  who 
answered  correctly  on  any  particular  item.  Tlirough  plotting,  this  method 
retains  the  Items  wit!:  the  largest  index.  The  author  claims  tr.is  method 
in  less  t lire -consuming  and,  at  the  some  time,  yields  at.  least  as  high  va- 
lidities 03  his  method  of  successive  residuals. 

Flanagan  (7)  suggests  a method  of  item  selection  in  which  n nucleus 
of  the  moot  valid  ! terns  are  first  selected,  and  items  ore  added  to  or  sub- 
tracted from  thin  nucleus  by  comparing  the  item-nucleus  correlation  of  each 


i 


item  with  the  item-criterion  correlation . The  items  having  a higher  corre- 
lation with  -the  criterion  than  with  the  nucleus  are  retained,  while  the 
others  are  dropped.  The  cycle  can  be  repeated,  but  Flanagan  notes  that 
only  a small  increment  of  improvement  results  from  additional  cycles. 

Gleser  and  Duriois  (8)  have  developed  a method  very  similar  to  that 
of  Flanagan.  However,  they  utilize  the  item-criterion  and  item-test  corre- 
lations to  compute  an  index  for  each  item  of  the  form: 


r jt  t J 
2 CTt 

This  index  provides  a correction  for  ’whether  or  not  the  item  is  included 
in  the  nucleus  ''t,!l  and  also  takes  into  account  the  changes  in  item-total 
correlations  which  result  after  the  first  selection  is  made.  It  provides 
an  exact  criterion  of  how  many  items  to  retain  in  the  final  test. 

Dailey  (14)  has  presented  a relatively  recent  method  for  keying  bio- 
graphical data  empirically.  This  method  grew  out  of  the  inadequacy  of  the 
method  of  selecting  those  responses  with  validity  coefficients  above  a 
given  level  of  significance.  In  this  method,  called  the  "pattern  of  re- 
sponse method."  all  possible  responses  are  correlated  with  a criterion, 
yielding  centinua  of  correlations  with  multiple-choice  items.  Those  items 
for  which  the  correlations  show  a consistent  direction  are  keyed.  Positive 
or  negative  unit  weights  are  assigned  according  to  the  sign  of  the  coeffi- 
cient, and  only  the  extremes  are  usually  keyed.  When  cross-validated  with 
subsequent  samples,  this  method  resulted  in  less  shrinkages  and  greater 
validity  than  the  method  of  simply  choosing  significant  items. 

Each  one  of  the  preceding  techniques  represents  the  empirical  approach 
to  keying  Items.  V/ith  slight  modifications  or  combinations  of  two  or  more 
principles  Inherent  in  each  method,  any  one  may  qualify  as  the  representa- 
tive of  the  empirical  approach  for  the  purposes  of  this  study.  Systematic 
comparisons  of  the  methods  of  Item  analysis  (8,  11,  15,  l8,  19,  23)  have 
not  given  much  satisfaction  for  selection  of  the  best  method.  The  choice 
of  method  seems  to  depend  upon  the  labor  which  is  Involved  for  the  obtained 
Increase  in  validity,  stability  of  the  validity  coefficient  for  subsequent 
samples,  and  the  ultimate  purpose  with  which  the  test  will  be  used.  It  is 
to  be  noted  that  with  the  exception  of  the  "pattern  of  response  method," 
there  is  little  evidence  for  the  greater  stability  of  any  one  method  over 
any  other.  It  is  to  be  emphasized,  furthermore,  that  almost  all  the  methods 
have  some  points  In  common  with  others.  At  least  one  fact,  however,  pro- 
vides a basis  for  the  selection  of  the  method  to  be  used  in  this  study.  It 
has  long  been  known  that  given  n items  with  identical  validities,  the  two 
Items  having  the  lowest  correlations  with  each  other  will  predict  the  cri- 
terion better  than  will  any  other  of  the  possible  pairs  of  items.  In  other 


words,  the  Items  which  are  selected  for  inclusion  in  an  empirical  key 
should  lend  unique  valid  variance  as  far  as  possible.  Therefore,  in  the 
development  of  an  empirical  key,  the  intercorrelations  between  the  items 
of  the  key  should  be  considered.  This  may  be  dene  directly  by  consider- 
ing the  item  relationships,  cr  indirectly  by  considering  the  item-total 
test  relationships.  The  Gleser  and  DuBois  method  of  maximizing  test  va- 
lidity ( (_' ) was  selected  o.  the  basis  of  the  latter  consideration , 


jiemogeneous  Keying 


Zubin  (26)  was  perhaps  the  first  who  applied  different  methods  for 
computing  item-total  relationships  In  an  attempt  to  develop  a homogeneous 
test.  He  noted  that  with  the  lack  of  suitable  external  criteria,  as  is 
often  the  case  with  personality  inventories,  proceeding  by  means  of  the 
internal  consistency  of  the  test  is  the  next  best  approach. 

Factor  analytic  tecliniques  have  been  combined  with  item  analysis  cn 
such  tests  as  the  Guilf ord-Martin  "inventory  of  Factors  GAKI1I,"  and  the 
Guilford  "inventory  of  Factors  STBCR . " The  major  criticisms  directed 
against  these  tests  are  their  lack  of  validation  data  and  their  laborious 
statistical  computation.  At  the  same  time,  substantial  overlap  of  the 
scales  was  developed;  in  some  cases  the  scales  were  lntercorrelated  as 
high  as  the  ,70's.  Favorable  criticism  of -the  technique  centers  around 
the  general  advances  given  to  test  construction,  as  well  as  their  independ- 
ence of  obsolete  and  unreliable  psychiatric  classification  (4,  pp.  SO,  82). 
This  latter  criticism  may,  of  course,  be  given  for  any  of  the  methods  which 
aim  toward  the  development  of  homogeneous  tests. 

Lcevlnger  (l6,  17)  conceives  of  homogeneity  essentially  as  the  average 
correlation  of  items  within  the  test.  She  presents  two  coefficients  de- 
signed to  give  the  degree  of  homogeneity  between  any  two  items  and  the  ho- 
mogeneity cf  the  test  respectively.  Cror.bach  and  Damrin  (5),  however,  have 
criticized  the  use  of  Lcevlnger' s coefficients  as  being  markedly  dependent 
cn  the  difficulties  of  Items,  and,  furthermore,  they  demonstrated  that  the 
coefficients  do  not  apply  when  the  relationships  between  Items  are  lev;.  It 
should  be  noted  that  the  concept  of  homogeneity  Is  dependent  cn  the  type  of 
test  Involved.  In  ability  tests  Item  relationships  are  high,  whereas  In 
personality-type  tests  Int-ercorrelatlcns  between  items  are  characteristi- 
cally moderate  or  low.  Cronbach  and  Damrin  showed  lastly  that  the  Kuder- 
Rlchardson  Formula  20,  or  Its  derivative  "phi  bar,"  was  sufficient  to  show 
the  equivalence  of  the  Items  up  to  the  point  where  the  correlations  between 
items  of  equal  difficulty  rise  to  .80  and  .90.  This  formula,  which  Is  the 
mean  of  all  possible  split-half  coefficients  of  the  test,  might  be  directly 
Interpreted  as  the  proportion  of  the  test  variance  that  is  contributed  by 
the  common  factors  among  the  items.  This  systematic  use  of  Kuder-Richardscn 
Formula  20  represents  an  untested  though  fairly  laborious  approach  for  con- 
structing a homogeneous  test. 
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Another  method  v:hleh  results  in  a Homogeneous  test  Is  referred  to  as 
"maximising  test  saturation"  by  DuBois,  Lcevinger,  and  Gleser  (6).  Briefly 
described,  this  method  takes  into  account  the  ratio  of  common  factor  var- 
iance that  tiie  items  contribute  to  the  total  variance  of  a test.  This 
ratio  has  been  titled,  "the  saturation  of  the  test."  When  items  are  added 
successively  to  a nucleus  of  three  or  four  highly  intercorrelated  Items, 
so  as  to  maximise  the  saturation,  this  should  result  in  a homogeneous 
test.  Moreover,  if  one  were  to  start  with  nuclei  that  have  little  in  corn- 
men,  the  keys  that  are  suosequently  developed  should  be  relatively  indepen- 
dent. This  method  vas  selected  to  represent  the  homogeneity  approach. 

Examination  of  the  history  of  keying  tests  homogeneously  reveals 
little  application  to  test  validation.  What  has  been  done  has  been  carried 
out  only  for  perscnality-type  tests  where  external  criteria  are  unsuitable  or 
lacking.  Keying  empirically  has  been  carried  out  generally  whenever  a 
choice  between  the  two  was  to  be  made.  It  would  appear  that  a crucial 
study  would  Involve  the  comparison  of  two  rigorous  methods  representing 
each  approach  for  keying  the  same  biographical  inventory.  This,  In  short, 
is  the  over-all  purpose  of  this  study. 


HYPOTHESES  TESTED 

The  following  hypotheses  are  tested  by  this  study: 

1,  The  empirical  keys  will  contain  higher  correlations  with  the 
criteria  than  the  homogeneous  keys  on  the  developmental  sample.  In  the 
first  place.  Biographical  Inventory  CE608C  was  developed  by  the  inclusion 
of  those  items  which  were  shown  to  be  valid  for  prediction  of  CCS  success. 
Secondly,  the  empirical  keys  Included  items  on  the  basis  of  their  specific 
contribution  Co  the  prediction  of  an  external  criterion.  Cn  the  other 
hand,  homogeneous  keys  are  constructed  solely  cn  the  basis  of  the  Internal 
consistency  of  the  items  which  may  or  may  not  be  related  to  the  criterion. 
The  empirical  keys  are  expected,  therefore,  to  be  characteristically  more 
valid  than  the  homogeneous  keys. 

2 . The  empirical  keys  will  show  a greater  shrinkage  and  a lower 
validity  than  the  homogeneous  keys.  The  items  of  the  homogeneous  keys  tend 
to  duplicate  each  other,  resulting  in  the  probable  cancellation  of  chance 
errors.  By  contrast  the  empirical  keys  will  approach  the  heterogeneity  of 
the  criteria  they  are  designed  to  predict.  For  this  reason  homogeneous 
keys  can  be  expected  to  be  generally  more  reliable  than  empirical  keys. 
Guilford  (y)  has  also  pointed  out  that  factoriaily  impure  tests  (empirical 
keys)  contain  variance  that  is  unrelated  to  the  criterion.  This  invalid 
variance  adds  spuriously  to  the  validity  when  chance  deviations  are  opti- 
mally weighted,  and  this  serves  to  lower  a cross-validity  as  would  a like 
amount  of  error  variance.  It -would,  therefore,  be  expected  that  these  two 
factors  would  result  in  a greater  shrinkage  and  lower  cross-validity  for 
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tcmogcr.voi 


>s  be  r evel. ol  orl  cal  1 


ind  the  empirical  keys  will  he  psycholorlcu j 1 


d!  I'M  cult  lo  mterpn'!  . insofar  as  the  .vet. hod  of  hennreueous  key:: 
v;cce:;r.f’.:l  in  Its  primary  real  of  malnt.aj.ulng  psychological  purity, 
r uh.eui  i tv  slnple  to  interpret,  hir.ee  the  empirical  keys  ,..i  1 1 : e < 
el  of  a multitude  of  factors,  It  should  to  difficult  to  know  li c: . > 
specific  factors  to  invoke  ar.d  to  what  degree  ir.  order  to  explain  : 
-cn's  score.  In  this  regard  Guilfordard  Lacey  (10,  p.  oh'l ) state; 

Is  (empirical ) procedure  would  seer,  merely  to  result  ir.  an  exf.cr.si  < : 
i prior ar.ee  to  new  valid  territory,  rather  than  to  increase  our  •tc.: 

* of  why  tests  are  valid  and  therefore  to  Improve  cur  control  ever  ' 
Lty  already  achieved . ’’ 


iOi  u LATIOt;  AML  CRITERIA 

Th.e  first  step  in  ’.(eying  by  either  the  homogeneous  or  empirical  a; - 
preach,  is  to  obtain  the  sample  with,  which,  such  keys  are  to  be  developed. 
The  sample  on  which  the  homogeneous  keys  •.-.ere  to  be  cove  loped  was  desig- 
nated, Sample  A.  If  was  obtained  by  selecting  every  third  inicr  out  of  a 
total  pool  of  basic  airmen  who  were  administered  Ci.t  br’C  during  Mover.':-',  r 
Y)'j'  until  a total  of  1000  papers  was  obtained.  Most  of  the  airmen  were 
In  their  second  week  of  military  experience.  These  1000  papers  were  then 
scanned  for  completeness  and  correct  scoring.  The  sampling  process  nnc 


cxa~i*'2t  * c *' 
obtained . 
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The  sample  eu  wliicii  the  empirical  keys  were  Lo  be  ueveiojed  a : . c:  c 
which  the  homogeneous  keys  were  to  be  validated  was  designated,  Sample 
This  sample  was  composed  cf  all  available  male  graduates  and  eli.minoes  of 
Officer  Candidate  School  (CCS)  Classes  V'-A,  yj-B,  anu  yO-C,  arid  totaled 
33£>  graduates  and  7$  elimlr.ees.  The  sample  with  which  the  empirical  keys 
and  the  homogeneous  keys  were  cross-validated  was  designated,  Sample  C. 
This  sample  Included  all  available  male  graduates  ar.d  eliminees  of  CCS 
Classes  M-A  and  bl-B  and  totaled  30 C graduates  and  29  eliminees. 


The  fact,  that  the  homogeneous  keys  were  developed  on  airmen  ar.d  vali- 
dated cn  officer  candidates,  while  tie  empirical  keys  were  developed  on 
officer  candidates,  may  represent  a serious  limitation  In  the  study.  Tie 
use  of  airmen  was  necessitated  by  the  lack  of  a sufficient  number  of  offi- 
cer candidates  who  had  been  administered  CEoOb'C  and  for  whom  criteria  oata 
were  available.  Since  the  basic  personality  variables,  as  elicited  by 
CE(-08C , may  have  been  somewhat  different  for  the  airmen  and  officer  candi- 
date populations,  this  may  have  served  to  reduce  th.e  validity  cf  the  homo- 
geneous Keys.  A comparison  of  the  two  keys  should  be  interpreted,  there- 
fore, with,  this  limitation  in  mind. 
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C.  CC.l  military  (Tacos . These  were  ratings  cn  military  poten- 
* lal  r.r  Je  by  the  Tactical  Officer  la  charge  cf  each.  flight:.  Kach  flight 
composed  of  officer  candidates. 


3.  CC.'4.  academic  /ra  les . This  criterion  car  determined  by  dif- 
ferentially weighting  t dependent  ip<  n the  number  of  hours  devoted  -o  each 
course)  objective  achievement  lest  scores  Lr.  the  various  academic  subjects 
taught.  These  subjects  Included  rumen. el  methods,  supply,  admial stratlcn, 
military  law,  etc. 

4,  CC;>  final  "races.  These  grades  cere  obtained  by  onnnliy 
weighting  th.e  academic  and  nilitary  g rades  into  a composite. 


1 HKI.lMINAHlt  rHCCKDfHES 


The  first  step  of  doth  the  homogeneous  and  empirical  approach  cas  the 
dlchotcml zaticn  and  weighting  of  items.  A decision  had  to  be  made  for  the 
most  meaningful  dichotomy  of  choices  cn  a five-choice  continuum.  Cne  part 
of  the  dichotomy  cas  tc  be  givv;.  miitj  v.eij.nt  widen  woun:  aroitrarlly  as- 
n 1 a zero  weight  to  the  oto.er  part  of  the  dichotcny. 


In  order  to  carry  out  the  dlchotcnlsatior.  and  weighting  cf  items,  the 
first  procedure  was  a random,  selection  of  a subsample  of  ?‘;0  papers  from 
Cample  A.  An  item  count  was  then  obtained  of  all  possible  answers.  Cn  th.e 
basis  of  th.e  item  count  and  logical  consideration  of  Judg.es  as  to  th.e  cart 
th.e  Item  might  later  play  In  a priori  keys,  all  th.e  'tens  were  split  dlchot 
om.ously,  approximating  the  OO-^O  split  as  far  as  possible.  In  a few  cases 
whore  the  Items  were  "double-barrelled1  or  "bifurcated, 11  thus  presenting 
two  possible  splits  with  distinct  y separate  Interpretations,  two  items 
were  developed  out  of  or.e. 


It  soon  becair.e  apparent  that  many  items  had  to  be  eliminated  from 
further  consideration  because  the  items  were  concerned  with  Air  Force  ex- 
perience. These  Items  would  obviously  not  yield  any  appreciable  valid  var- 
iance, since,  as  pointed  out  before,  almost  all  the  examinees  were  In  their 


SOS S i ble 


available  i cr 


second  week  as  airmen,  but  of  ar.  orlg. 
keying,  there  now  remained  183.  These  preliminary  procedures  were  common 
for  both  the  empirical  and  homogeneous  approach.  From  this  point  both  m.eth 
o.ds  proceed  In  divergent  directions. 
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c k ns-:  probikk 

Phi ' tuny  van  to  bo  carried  out,  in  accordance  with  ' he  following,  <>b- 
J.  ctiv.  a: 

!..  derivation  of  homogeneous  and  relatively  Independent  hey:-, 
for  Biographical  Inventory  CC 60cC  on  Sample  A.  This  was  to  be  done  by  *r< 
method  oi‘  maximizing  tent  saturation. 

L\  lr.tercorre  lation  of  hcmo,-T  no  o us  keys  on  Sample  3. 

y.  Validation  of  each  homogeneous  key  or.  each  criterion  cf  Gam- 
pl  - B to  Inc  rule:  final  CCS  grad.',  military  CCS  grade,  academic  CCS  grade, 
and  pa. a n / f a i 1 in  OCX? . 

. Computation  of  beta  coefficient'’  for  each  hemogc.  r^cus  key 
for  each  criterion  and  ccr.putatlon  of  the  coefficients  of  multiple  c err- - 
la  tier,  with  each  criterion. 

Scoring  of  Sample  C o-.  the  homogeneous  keys  and  weighting  of 
each  homogeneous  key  !)C”!a  by  beta  weights  entabli3hed  in  each  multiple  re- 
gression formula,  an  determined  fre  : Sample  3. 

6.  Summation  of  weighted  homogeneous  key  a cores  to  yield  a pre- 
dicted criterion  accre. 

7.  Correlatior.n  cf  predicted  criterion  3coro3  with  actual  cri- 

• v . 1 for  Sample  C. 

8.  derivation  of  the  empirical  key3  for  each  criterion  of  Sam- 
ple 3 by  the  Gloser-LuBoi3  method  for  maximizing  te3t  validity. 

J.  Scoring  of  Sample  d on  eao.n  empirical  key,  and  correlations 
of  empirical  key  scores  against  criterion  scores. 

10.  Comparison  of  the  validities  and  croo3-validlties  of  both 
sets  of  keys,  and  evaluation  of  the  relative  difficulties  and  character- 
la  tlc3  of  each  method. 

11.  Psychological  comparison  of  both  sets  of  keys. 

Li CMCGE i.T G'JS  KCi'TIiG* 

The  first  step  of  homogeneous  keying  was  an  a priori  categcr lost  ion 
of  the  lBj  items  available  for  keying  by  three  Judges . This  resulted  in 


J A detailed 
od  for  maximizing 
Closer  (6). 


theoretical  and  me thodoiogleal  presentation  cf  the  meth- 
teot  saturation  may  be  found  in  DuBois,  Leaving.. r,  and 


the  formation  of  If  categories  which  showed  premise  of  common  factor  con- 
tent. Of  these  13  categories,  four  were  combined  since  it  was  felt  that 
each  cf  the  four  might  possess  a fairly  high  relationship  with  its  respec- 
tive paired  member. 

Having  carried  out  the  procedure  for  maximizing  test  saturation,  13 
first-cycle  categories  were  deriveu  from  the  a priori  categories.  The 
first-cycle  categories  Included  a total  of  I2J  Items  of  v;hich  three  Items 
were  Included  in  two  categories.  From  the  residual  number  of  57  unplaced 
items,  me  additional  category  of  11  liens  was  developed.  Each  category 
was  named  following  Inspection  of  the  item  content.  The  category  data, 
including  the  name,  mean,  variance,  and  saturation  are  given  in  Table  1; 
and  the  category  intercorrelatlcns  are  given  in  Table  2.  Each  homogeneous 
category  is  Identified  by  a letter  which  indicates  the  a priori  cluster. 
V/here  more  than  one  category  was  derived  from  the  a priori  cluster,  a sub- 
script accompanies  the  letter,  indicating  the  order  of  category  evolvement. 

As  may  be  noted  in  Table  2,  one  of  the  categories.  Aggressiveness, 
seemed  to  resemble  a general  factor,  since  it  correlated  high  (above  .34) 
with  cne-half  of  the  other  categories.  Since  it  was  intended  to  develop 
Independent  categories  with  as  many  items  as  possible,  it  was  decided  to 
put  the  eight  items  comprising  this  category  back  into  the  general  pool  of 
unused  Items  and  to  reconsider  them  following  the  development  of  independ- 
ent categories.  Two  of  the  eight  items  were  included  in  the  Independent 
categories  in  a later  cycle. 

Category  Intercorrelations  In  Table  2 were  now  examined  to  determine 
f y^0  feasibility  of  combining  tvo  ot  mor,e  highly  c onne late i categories  into 
a single  matrix,  ’when  the  general  factor  category  was  removed  frem  con- 
sideration, seven  correlations  remained  which  ranged  from  .35  to  .49.  It 
was  decided  to  delay  any  combinations  of  categories  until  an  Inspection  of 
the  intercorrelations  of  the  completed  first-cycle  categories,  at  which 

-<  mo  nil  1 f omr  i i m 1 1 H ViOTrp  V»oo*-\  r*  Anrio  In  f a.1  f oil  aa  f o ->»  4 « r.  0 4. 4-1-  — 

W . ^ i'.  - - •.!«*  v W.  . - — — — ■ MJ.  »t.  J.  w u ww  Wi  ( WXilWW.  uuw. 

number  of  correlations  exceeding  .35  dropped  from  seven  to  two  as  a result 
of  the  removal  of  eight  Items  from  first-cycle  categories.  It  was  decided 
to  continue  the  cycling  without  combining  any  first-cycle  categories. 

In  order  to  achieve  greater  Independence  of  categories  without  much 
loss  of  saturation,  categories  were  revised  In  a second  and  a third  cycle. 
Tables  3 and  4 and  Tables  5 and  6 present  the  category  data  and  category 
intercorrelations  with  Cycles  2 and  3.  respectively.  Table  6 also  Includes 
the  data  for  the  revised  general  factor  category  following  the  Inclusion 
of  seven  items  which  added  to  the  saturation.  Table  7 presents  a compari- 
son of  the  Independence -of— categories  a* -a— result— of—' the  cycling  process. 

It  may  be  noted  that  In  order  to  achieve  a decrease  of  average  correlation 
between  categories  of  .05,  15  per  cent  of  the  total  poss  ble  number  of 
Items  had  to  be  dropped  from  Cycle  1 to  Cycle  3. 
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Table  1 


Klrot-Cycle  Har.cgrr.ecuo  O tc  gcr y I>-t n 


( Sample : 

1000  basl 

c a 1 rme 

n) 

No.  of 

Vari  - 

3a bur ' 

Category 

Items 

Kean 

ance 

atl  cr.' 

A 

Mechanical  Aptitude 

9 

6.1? 

3.16 

.48 

D1 

Athletic  Experience 

13 

6.29 

9.49 

.70 

D ? 

Childhood  Games 

8 

6.21 

3.30 

.6o 

C 

i1  Dayboy'"1 

IP 

4.22 

3.98 

.49 

D 

Soci o-Eccncralc 

17 

9.04 

14.08 

.74 

£ 

Schl~olda 

6 

1.96 

1.94 

.36 

F1 

Parental  Criticism 

11 

6.91 

8.02 

.68 

V2 

C 

Extroversion 

13 

9.33 

8.13 

.67 

p3 

Agresslvaness 

8 

4.74 

3.34 

.50 

G 

Itlnei-ant3 

6 

2.63 

2.94 

.60 

H 

Scholarship 

13 

4.32 

6.94 

.62 

I 

Societal  Acceptance 

~ 1 

7.90 

6.71 

.59 

.7 

Childhood  Responsibility^1 

11 

9.13 

9.82 

.57 

a These  category  names  were 

1 changed 

from  a 

priori  names 

fell owing 

anlnatlcn  of  Item  content. 


4 This  category  was  developed  out  of  the  residual  items  (n  = 97)  un- 
placed in  Cycle  1. 


c n-1  n 

Saturation  = ?..'•  >.  „ 

1=1  1=1+1 


C 


1J 


1=1 


Vl  + 


n-1 

O \ 
"1=1 


1=^1  CU 


where 


C^j  = covariance  between  any  two  items  and  = variance  of  any  lien  1, 


Intcrccrr..  Ifiticr.o  cf  Firnt-Cycic  Iicr.cg<T.c.-c.ua  Catcgcrlcu 
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See  Table  1 for  full  category  title. 


Tab  lc  3 

■d-CycIe  Hcr.cptr.ccuB  Catepcry  irAa. 


(N 

Cate ffory 

A Mechanical  Aptitude 
B,  Athletic  Experience 
B;>  Childhood  Games 
C Plavhoy 
D Sccio-Eccncmic 
E Schizoid 

Parental  Criticism 
F2  Extroversion 
G Itinerant 

»»  »V»  n 

U r 

I Societal  Acceptance 
J Childhood  Responsibility 


1000) 

No.  of 
i temfl 

Mean 

Varl- 

ance 

Satur- 

ation 

7 

U.89 

2.07 

.uu 

10 

U.77 

6.50 

.67 

7 

5.^2 

2.59 

.57 

11 

3. >5 

3.67 

J‘5 

lU 

7.55 

10.60 

.71 

6 

1.96 

1.91* 

.36 

11 

6.51 

8.02 

.68 

5.3? 

8.13 

.67 

6 

2.63 

2.51* 

.50 

11 

3.81 

5.66 

,6o 

lU 

7.87 

7 >8 

.60 

7 

3.32 

2.99 

.46 
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Tabic  5 


rir.el  Homogeneous  Category  £aca--Third  Cycle 

(N  = 1000) 


Category 

No.  of 
Items 

Mean 

Vari- 

ance 

Satur- 

ation 

A 

Mechanical  Aptitude 

7 

U.89 

2.07 

.UU 

B, 

Athletic  Experience 

10 

^.77 

6.50 

.67 

b2 

Childhood  Games 

7 

5.52 

2.59 

.57 

c 

Playboy 

13 

3.96 

^.55 

.52 

D 

Socio-Economic 

13 

7.25 

9.52 

.71 

E 

Schizoid 

6 

1.96 

1.9s* 

.36 

F1 

Parental  Criticism 

11 

6.51 

8.02 

.68 

TT„ 
• C 

Fxtrov°i*9  ion 

11 

U.64 

6.Uo 

.65 

G 

Itinerant 

6 

2.65 

2.5^ 

.50 

H 

Scholarship 

8 

2.75 

3.55 

.5^ 

I 

Societal  Acceptance 

13 

7.51 

6. 66 

.58 

J 

Childhood  Responsibility 

6 

2.60 

2.UU 

F5 

Aggress lveneso 

15 

8.59 

8.96 

.eu 
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See  Table  5 for  full  category  title 
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- rr.  hitler...  o!'  1 r.b.  -pendt  nt  il r meg*  r.<. cuu 
Categoric.)  By  Cycle  o 


-n  Cycle-  1 

Cycle 

UA 

Cycle 

2 Cyc if  3 

A-.o 

- .!*•  1 

• 33 

- -39 

• ✓ v •' 

- . 3- 

. d 3 

- .le)  0 

( 

1 

M) 

- .It  t 

o 

.33 

- .59  3 

3 

1 

i 

.30 

- .3!*  lo 

9 

6 

1 

.23 

- .2«  3 

3 

2 

3 

. do 

- .22  8 

V 

8 

7 

.13 

- . 19  13 

13 

17 

1 

.10 

- . Id  8 

6 

10 

ll- 

.03 

- .C9  1? 

11 

12 

12 

O 

r-H 

O 

1 

10 

9 

8 

Total  corre inti one  78 

66 

66 

66 

Total  ltema  used  l*.o 

132 

117 

Hr 

Number 

of  ltema  aaded 

Added 

2 

Added  2 

and/or  dropped  Dropped 

8 

Dropped 

17 

Dropped  r- 

Per  cent  of  pea- 

a lb  le 

ltema  used  73 

68 

6l 

38 

Average 

corre la- 

tlon 

.2ul 

. 185 

.133 

. lU6 

3igna  of  lntercorrelationo  arc  emitted. 

^ Category  "Aggreaoiveneea"  which  appeared  aa  a general  factor  In  Cycle 
i vaa  dropped  for  Cycle  1A. 


EMPIRICAL  FLYING2 

The  tentative  empirical  keys  were  to  bo  formed  by  including  these  items 
vhcae  correlation  with  a criterion  was  significant  at  the  .01  level  of  con- 
fidi  nee . Irani  nation  of  the  correlations  revealed  that,  16,  l8,  and  20  items 
qualified  for  inclusion  Into  the  final  grade , military  grade.-,  and  academic 
.-Trade  keys,  respectively.  However,  it,  was  also  noted  that  only  seven  items 
qualified  at  the  .01  level  of  significance  for  inclusion  Into  the  pass/fail 
key.  It  was  decided  to  lower  the  requirement  for  including,  an  item  in  the 
pass/fall  key  to  the  .05  level  of  significance.  This  decision  resulted  in 
the  addition  of  five  more  items  or  a total  of  12  items  in  the  pass/fail  key. 

Sample  B answer  sheets  were  scored  on  the  tentative  empirical  keys, 
and  these  scores  were  correlated  against  their  respective  criterion.  These 
correlations  and  other  summary  data  of  the  tentative  empirical  keys,  includ- 
ing the  items,  in  the  keys,  means,  and  standard  deviations  are  presented  in 
Table  8. 

First-cycle  empirical  keys  were  now  developed  by  the  Gleser  and  BuBois 
method  of  maximizing  test  validity.  The  answer  sheets  were  scored  on  the 
first-cycle  keys,  and  these  scores  were  correlated  with  their  respective 
criterion.  The  key-criterion  correlations  and  summary  data  for  first-cycle 
keys  are  presented  In  Tabic-  9. 

Since  the  magnitude  of  each  first-cycle  key-criterion  correlation  in- 
creased by  at  least  four  correlation  points,  a second  cycle  was  carried  out. 
It  was  noted  at  the  completion  of  the  second  cycle  that  the  key-criterion 
correlations  increased  only  slightly  above  these  of  the  preceding  cycle, 
and,  therefore,  no  additional  category  refinement  seemed  necessary.  The 
key-criterion  correlations  and  summary  data  for  second-cycle  categories  are 
presented  in  Table  10.  The  comparative  changes  from  the  tentative  empiri- 
cal keys  to  the  final  keys  are  presented  In  Table  11.  It  was  now  possible 
to  score  a new  sample  on  boon  the  empirical  and  homogeneous  keys  and  com- 
pare their  respective  validities. 


VALIDATION  OF  KEYS 
Validation  of  the  Homogeneous  Keys 


It  has  been  pointed  out  that  the  homogeneous  keys  were  developed  Inde- 
pendently of  any  external  criteria.  Prior  to  a cross-validation,  therefoi’e, 
it  was  necessary  to  obtain  the  intercorrelation  of  the  keys  and  the  valid- 
ities and  beta  weights  for-  those  cr iteri'a~whTch  were  used  to  develop  the 
empirical  keys.  For  control  purposes,  the  same  validating  sample  with  which 
the  empirical  keys  were  developed  was  utilized  to  obtain  these  data.  This 


^ For  a detailed  theoretical  and  methodological  presentation  of  the 
method  of  maximizing  test  validity,  cf.  Gleser  and  DuBois  (8). 
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Tub  It.-  8 

Correlaticno  and  Summary  Data  of  Ter.tati.’c 
Empirical  Keys  and  Criteria 


ItCILfl 


Criteria  cr  key 

n 

in  key 

Mean 

SD 

r 

Final  grade  n 

356 

**.99 

1.97 

.39 

Final  grade  key 

336 

16 

8.62 

2.CU 

a 

Military  grade 

336 

5.08 

1.89 

.50 

Military  grade  key 

336 

18 

11.18 

2.56 

n 

Academic  grade' 

33o 

5.03 

1.96 

.50 

Academic  grade  key 

336 

20 

11.57 

2.63 

Paao/fall 

*4  lU 

.81 

.39 

.U5b 

Fass/fai)  key 

UlU 

12 

6.77 

1.76 

Standardized  In  atanlne  unite. 

b Bloerial  correlation  coefficient,  where  p = .81  and  q = .19. 
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rr'  hitler.:  end  Summary  la  tn  cl  i'tr.p:  rlrn  1 
Keyn  r.r.d  Critt  rl-:— i'lrat  Cjclc 


I terra 


Criteria  or  key 

II  In  key 

Mean 

SO 

r* 

Final  prade" 

55* 

1.97 

>5 

Final  (.tiu!c  key 

<>36  29 

i v pp 

5. C5 

O 

Military  (-Tndc'' 

56 

9.00 

1.69 

rj. 

Military  prndc  key 

536  59 

Pi. 15 

3.3*. 

ic  ,-Tade' 

356 

9.03 

1 . 96 

.96 

Academic  prndr  key 

336  31* 

20.16 

3.38 

Fasa/full 

1.11* 

.81 

.39 

F'aaa/fatl  key 

• 90b 

UlU  IT 

11.78 

2.  17 

a Standardized 

In  atanlne  unite. 

Bioerial  correlation  coefficient, 

where  p 

- ,8l  and 

q ^ . 

lab  > 10 


Correlations  and  Simcary  baba  of  f inal 
Kmplrlcai  Keys  and  Criteria 


1 tens 


Criteria  or  key 

:< 

in  key 

Mean 

3D 

, a 

••  1 r'n  1 n 

* 4**  - - t>* 

336 

4.99 

1.97 

Flr.nl  grade  key 

936 

39 

21.10 

3.61 

Military  grade0 

336 

9.08 

i . 89 

M 1 1 itiry  gr  a d c ke  y 

336 

bo 

27.86 

3.67 

Acaden'c  grade 

336 

5.03 

1.96 

Academic  grade  key 

336 

39 

23.47 

3.54 

Paas/fall 

UlU 

.ei 

• 39 

Pass/fall  key 

4i4 

19 

12.49 

2.50 

Standardized 

in  otanine 

units . 

Biserlal  correlation  coefficients,  where  p = .81  and  q 
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Tp.bL 


11 


i-cmpnr 

inon  Between 

Te native 

And 

Final  Empirical  Keyn 

Per  cent  of 

Increaoe 

V« 

c 

c 

penn ible 

in  r with 

Key 

Cy c It? 

1 tcm.fi 

items  used 

criterion 

Cycle  1 

16 

09 

Final  grade 

Final 

59 

CO 

.0^* 

Cycle  1 

1M 

09 

Ml  1 1 tar y grade 

Final 

Ko 

Pi 

.01 

re.~  v i 

O/N 

JL</ 

Acadeaic  grade 

Flnal 

39 

20 

CO 

o 

• 

Cycle  1 

IE 

w v 

i-cos/fa ' 1 

Final 

19 

10 

.06 

B 


sample , known  as  Sample  3,  included  33-f  graduates  and  7?  cllninees  of  CCS 
Classes  50-A,  pO-B,  and  pl-C. 

Table  12  presents  th  later correlations  of  the  homogeneous  keys  based 
upon  the  independent  Sample  n.  The  magnitudes  of  the  intercorrelations 
were  strikingly  similar  z .•>  these  obtained  on  Sample  A,  and  the  average  cor- 
relation of  the  matrix,  minus  tne  general  factor  category,  increased  only 
eight  points  in  the  third  decimal.  Also  noteworthy  was  the  fact  that  the 
general  factor  now  cut  across  the  categories  less  than  it  did  in  Sample  A. 
This  fact  can  probably  b<-  attributed  to  the  addition  of  seven  items  to  the 
general  factor,  since  the  category  was  not  recorrelated  with  the  other  cat- 
egories following  its  final  revision  in  the  third  cycle.  A last  point  to 
be  noted  in  Table  i'2  was  the  shrinkage  of  seme  saturation  coefficients. 
(Compare  these  saturations  with  these  in  Table  7.)  Shrinkage  of  the  satura- 
tion coefficient  occurs  for  the  same  reason  as  for  a correlation  coefficient: 
the  error  factor  in  the  first  sample  is  weighted  in  favor  of  the  original 
keying,  and  since  error  «u.r lance  dees  not  reproduce  itself  in  subsequent 
administrations,  additional  error  appears,  and  the  saturation  or  correlation 
coefficient  diminishes.  It  should  be  noted  that  a shrunken  saturation  coef- 
ficient represents  a true:  estimate  of  homogeneity. 


Having  obtained  a truer  es  timate  of  the  intercorrelations  of  the  homo- 


geneous keys  ana  their  separate  vail 
beta  we ights  arc.  lour  muiaaric  come 
data  for  these -multiple  correlations 


cities  on  four  criteria,  four  sets  of 
lat.ions  were  computed.  The  detailed 
, including  the  homogeneous  keys  com- 


prising the  predictor  composite,  beta,  weights,  validities,  and  multiple  R's, 
sire  given  in  Tables  Ip  tnrough . 16. 


Cross-Validation  of  the  Homogeneous  and  Empirical  Keys 

.Sample  C,  which  was  composed  of  3c6  graduates  and  pi  failures  of  CCS 
Classes  pi- A and  51-2,  was.  scored  on  each  of  the  four  emnirlcair  keys  and 
on  the  13  homogeneous  keys.  Three  Pearson  product-mcment  correlations  were 
obtained  for  the  empirical  keys  against  their  respective  CCS  grades,  _and 
one  biserial  correlation  coefficient  was  computed  for  pass/fail  on  its  em- 
pirical key.  These  correlations  represented  the  cross-validities  of  the 
empirical  keys.  In  order  to  obtain  the  multiple  validities  of  the  homo- 
geneous keys,  the  raw  scores  of  the  keys  comprising  the  predictor  composite 
were  weighted  by  their  particular  regression  weight.  These  weighted  scores 
were  summed  along  with  the  constant  term  to  give  a composite  predicted  cri- 
terion score  for  each  subject.  Each  predicted  criterion  score  then  was  cor- 
related against  she  subject's  obtained  criterion  score  to  give  the ■ multiple - 
validity  correlation.  A comparison  of  the  data  comprising  the  cross- 
validation  is  given  in  Table  17. 

One  of  the  most  significant  comparisons  of  the  two  keys  to  be  made  in 
this  study  was  between  validities  of  the  empirical  keys  and  the  multiple 
validities  of  the  homogeneous  keys.  Table  17  gives  the  critical  ratios  for 
the  differences.  Inspection  of  Table  17  reveals  that  in  cross-validation 
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there  '.'an  a tendency  l'or  the  homogeneous  key;.?  to  predict  tvo  of  the  cri- 
teria better  than  the  empirical  key a, and  the  empirical  keys  to  predict  the 
other  tvo  criteria  better  than  the  homogeneous  keys.  Since  none  of  these 
differences  was  significant,  it  in  concluded  that,  insofar  as  these  data 
are  concerned,  neither  the  empirical  nor  the  hcmogencous  method  of  keying 
proved  superior. 


A second  important  comparison  tc  be  made  between  the  two  keys  was  the 
comparison  of  the  shrinkages.  Table  18  presents  the  relevant  data.  It 
may  be  noted  that  for  all  four  criteria  the  shrinkages  of  the  empirical 
keys  were  significantly  greater  than  zero  beyond  the  .01  level  of  confi- 
dence. The  shrinkage  resulting  from  the  cross-validation  of  the  homogene- 
ous keys  on  only  one  criterion,  academic  grade,  was  significant  beyond  the 
.05  confidence  limit.  The  homogeneous  keys  showed  significantly  less 
shrinkage  than  the  empirical  keys  beyond  the  .01  confidence  level  on  mili- 
tary grade  and  pass/fail,  and  beyond  the  .05  confidence  level  on  academic 
grade . 2 


PSYCHOLOGICAL  COMPARISON  OF  THE  KEYS 


The  last  comparison  to  be  made  between  the  empirical  and  the  homogene- 
ous keys  was  the  degree  to  which  the  scores  on  each  led  to  a better  under - 
s.tanding  of  the  criteria,  that  is,  the  degree  to  which  each  was  psycholog- 
ically meaningful.  In  a pamphlet  prepared  to  3et  forth  the  objectives  of 
the  CCS  curriculum,  certain  traits  were  hypothesized  which  seemed  to  dis- 
criminate the  superior  officer  from  the  poor  officer  (25).  With  these  de- 
sirable traits  as  the  criterion,  three  judges,  independently  and  later 
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terion  definition. 


After  an.  examination  of  the  Items  of  the  empirical  keys,  there  was  u — 
nanimous  agreement  that  about  two- thirds  of  the  items  bore  no  logical  rela- 
tionship with  the  criterion  and  that  most  of  the  remaining  items  bore  only 
indirect  relationship  at  best.  Examples  of  these  items  which  were  found 
in  two  or  more  of  the  empirical  keys  with  a positive  validity  are,  "played 
card  games  in  childhood,"  "carried  on  woodworking  and  cabinet-making  as  a 
hobby,"  and  the  items  with  significant  negative  validity  were,  "having 
ridden  a horse  in  childhood"  and  "having  driven  a motor  beat." 


Frcm  another  point  of  view,  at  least  two  desirable  traits  of  the  su- 
perior officers,  which  the  Judges  agreed  were  measured  by  various  items  in 
CE6o8C,  were  superior  scholarship  and  cooperation  with  fellow  workers  or 
group  participation.  Examination  of  the  valid  items  hypothetically  related 


^ Since  the  standard  error  of  shrinkage  on  pass/fail  is  based  upon  the 
transmutation  of  biserial  r to  Fisher  z,  it  is  probably  an  underestimate. 
Caution  should  be  exercised  in  the  interpretation  of  the  overestimated  crit- 
ical ratio. 
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Table  17 


''rcon-ValiCQtlcr.  of  ! npiric  .1  «uid  HcirogcT.r  oun  Keys 
(Cample:  officer  candidates  of  Classes  51-A  and  51-B) 
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Empirical 
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Pass/fall 

Homogeneous 
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.85 

• Ik 

po 
mc  c. 

.6k 

Blscrlai  correlation  coefficient,  wnere  p = .91  and  q ; ,C9. 


‘o  scholarship  s:;,vn:  that,  an  exp.oc. ted,  tic  .subjects  having  sutorior  high 
.v’r.ool  (-'racer.  x«'e  lied  in  f Qr> . tithe;-  items  are  keyed  a3  valid,  however, 
which  on  the  ntirfaco  appear  inconsistent  with  superior  scholarship,  e.g. , 

■ r.e  more  successful  officer  Candida  ten  had  cnly  a high  school  education  or 
.ess.  Whatever  were  the  Condi flor.3  wi.ich  caused  the  lenaer  educated  oub- 
• ctn  to  excel  in  CCT>,  if  in  logical  to  assume  Uaxt  free  year  to  year  ouch 
"or.ditlor.n  would  net  be  repetitive.  Two  i term  which  wore  related  to  nohol- 
arfihio  but  wore  keyed  in  a direction  contrary  to  expectations,  include:  (1) 

ndverae  feoiir.gr.  toward  education,  and  ( . ) an  hth  gr  ade  education  or  leu  a 
for  fa  them  of  officer  Candida  ten. 


The  second  hyg.ot  hcslo,  cooperative  working  with  others,  which  alao 
needed  to  be  measured  by  CE6o£C,  wa3  concerned  with  oeveral  items  keyed  ao 
valid  v.hicr.  would  appear  to  discr  lnlr.ate  the  more  cooperative  free  the  lees 
cooperative,  however,  other  items  wore  keyed  an  valid  which  appeared  oppo- 
nitely  related  to  social  cooperation  and  participation,  such  an  preference 
for  working  a lav  and  no  experience  an  an  inatructor  or  group  leader  or  de- 
sire to  be  uu’.  At  the  name  time  such  it=*mn  ao  the  deolre  to  advioe  or 
he  Ip  others  and  active  participation  in  various  club  activities  remained 
unkeyed.  The  attempt,  to  ioolnte  the  above  t.vo  "deoirable"  traits  by  exnn- 
'.nation  cf  the  valid  i terns  wan  fruit l.-r.s. 


An  equally  critical  appraisal  should  be  made  of  the  psychological  mean- 
ingfulr.e3n  of  the  hrnogeneous  keys  with  the  objective  of  a better  understand- 
ing of  the  criteria.  Cn  the  basin  of  an  inspection  of  the  item  content,  de- 
scriptions of  the  15  categories  are  given  below.  The  descriptions  "typify" 
the  high-scoring  <ndiv Idiial, 

1.  Mechanical  Apt! tude : A person  scoring  high  in  this  category 

has  carried  on  woodworking  an  a hobby',  has  a shop  in  the  heme,  has  excelled 
in  3 hop)  worx  in  school,  and  he  fan  made  various  kinds  of  mechanical  repairs 
in  his  youth,  an  well  as  in  adulthood. 

2.  Athletic  Experience:  This  individual  iia3  engaged  in  various 

team  sports,  often  no  a captain  or  c each..  He  has  frequently  engaged  in 
various  typo3  of  Individual  sports,  and  lie  has  excelled  in  physical  train- 
ing lr.  school. 

5.  Chi Idhocd  Games : This  subject,  as  a child,  has  participated 

in  such  gumes  as  playing  checkers,  dcminceo,  and  card  gamer,  digging  caves,' 
and  building  einb  souses, 

. Playboy:  This  person  has  par  tlcijated  in  various  forms  of 

gambling  in  high  -school.  He  prefers  playing  poker  over  playing  softball, 
winning  a large  sun  of  money  over  finding  a similar  unclaimed  sum,  working 
from  9:50  to  9:50  over  7:50  to  3:50,  a clever  friend  over  an  honest  cne, 
and  staying  at  heme  to  read  over  going  cn  a hike.  He  will  not  believe  in 
or  is  unable  to  stick  to  a budget,  and  ho  will  frequently  go  nightclubblng 
during  recreational  hours. 


5.  Socio-Economic : In  the  heme  of  the  high-3Coring  subject  of 

tills  category  there  would  to  such  things  as  a waffle  iron,  vacuum  cleaner, 
extension  telephone,  television  set,  automatic  water  heater,  and  a large 
number  of  books.  The  father  and  mother  of  this  subject  have  at  least  en- 
tered high  school,  and  the  subject  ha3  no  more  timn  two  siblings. 

6.  Schizoid:  Tills  individual  doesn't  like  to  talk  over  personal 

problems.  He  doesn't  expect  his  friends  to  help  him  out  of  a Jam.  He  feels 
that  what  other  people  do  is  their  business,  and  he  prefers  to  be  left  alone. 
He  has  few  friends,  if  any. 

7.  Parental  Criticism:  This  high-3Coring  subject  has  often  been 

criticized  by  hi3  parents  over  such  issues  as  relations  with  the  opposite 
sex,  gambling,  smoking,  drinking,  choice  of  career,  and  not  attending 
church. 

8.  Extrovers  ion:  This  person  has  been  a leader  in  school  or  a 

club,  a class  officer,  debater,  active  member  in  dramatics,  an  instructor, 
and/or  a camp  leader. 

9.  Itinerant:  Tills  individual  ha3  hitch-hiked  farther  than  .100 

miles  on  a trip  before  completing  high  school.  He  prefers  work  with  oppor- 
tunity for  travel  and  adventure  over  good  pay  and  premotion,  working  in 
different  places  over  working  in  the  same  building,  changing  jobs  often 
over  working  at  the  same  Job,  being  sent  overseas  over  staying  in  the 
united  States. 


10.  Scholarship:  This  person  has  excelled  in  all  courses  in  high 

school;  he  has  never  failed  a course.  He  has  often  visited  a library  or 
museum  in  his  recreational  ho'trs  or  on  vacatiens. 

11.  Societal  Acceptance:  This  subject  believes  that  laws,  judges, 

and  juries  are  not  prejudicial,  that  there  is  much  fun  and  few  worries  in 
life,  and  that  education  does  not  lead  to  discontent.  He  further  is  again- t 
crossing  picket  lines  and  is  in  favor  of  labor's  striking.  He  would  also 
not  prefer  more  color  in  the  Air  Force  uniforms. 


12.  Childhood  Responsibility:  Prior  to  high  school  this  subject 

rode  an  interurban  bus  or  train  alone.  He  has  had  the  responsibility  for 
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charge  account  and  has  owned  a car  when 


in  high  school,  and  has  made  a business  deal  in  excess  of  $300. 


13.  Aggressiveness : This  is  the  general  factor.  This  high- 

scoring  individual  has  had  fist  fights  in  his  youth.  He  also  gambled  and 
made  long-distance  cajls  before  he  was  18  years  old.  He  was  very  athletic, 
having  captained  or  coached  a team.  He  has  been  fairly  proficient  in  such 
sports  as  diving,  boxing,  wrestling,  and  football.  He  admits  beating  some- 
one in  a trade,  and  having  taken  advantage  of  someone  slyly.  He  has  been 
the  leader  of  public  meetings  and  bull  sessions,  and  engages  or  has  engaged 
in  many  dates  per  week. 
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It.  may  to  not-'d  from  the  above  descriptions  that,  in  contrast  to  the 
empirical  Keys,  nil  the  categories  deal  with  a central  theme  of  greater  or 
lesser  complexity.  The  comparison  of  the  two  sets  of  keys,  as  they  relate 
to  criterion  definition  is  discussed  In  the  next  section. 


I J'iTE  R PEE  TATI  ON  01'  INSULTS 


Evaluation  of  the  Cross-Validation 


V.Tith  reference  to  the  hypotheses  previously  stated,  the  following  con- 
clusions are  indicated  by  the  data  of  tills  study,  and  each  is  discussed 
briefly  in  turn: 

1.  The  empirical  keys  contained  higher  correlations  with  the 
criteria  than  the  homogeneous  keys  cn  the  development  sample.  As  it  was 
previously  stated,  Biographical  Inventory  CE608C  was  originally  devised 
by  the  selection  of  valid  items  , and  item  inclusion  in  the  empirical  keys 
was  based  upon  the  unique  contribution  to  that  validity.  In  addition,  it 
was  found  that  -0  to  U6  per  cent  of  the  items  constituting  the  empirical 
keys  were  either  too  heterogeneous  or  not  in  sufficient  number  to  be  in- 
cluded in  the  homogeneous  keys.  This  represented  a considerable  source  of 
validity  untapped  by  the  homogeneous  keys. 

2.  The  shrinkages  of  the  empirical  keys  were  significantly 
greater  than  the  homogeneous  keys.  Since  the  empirical  keys  had  higher 
correlations  with  all  criteria,  greater  shrinkage  might  be  related  to  a 
larger  original  correlation  rather  than  or  in  addition  to  the  differences 
in  homogeneity.  A research  design  to  discover  these  relationships  would 
require  the  comparison  of  shrinkages  of  a large  number  of  both  homogeneous 
and  heterogeneous  keys.  This  laborious  Job  is  beyond  the  scope  of  this 

a+n^ir 
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3.  Neither  method  of  keying  yielded  superior  validities.  While 
the  difference  between  the  validities  was  not  significant,  the  empirical 
keys  yielded  higher  validities  for  the  prediction  of  academic  grades  and 
final  grades,  and  the  homogeneous  keys  yielded  higher  validities  for  the 
prediction  of  military  grades  and  pass/fail.  This  seems  worthy  of  further 
investigation,  since  it  is  possible  that  empirical  keys  may  relate  to  the 
prediction  areas  already  accounted  for  by  aptitude  and  achievement  tests, 
while  homogeneous  keys  may  relate  to  the  relatively  unexplained  social  area 
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psychologically  meaningful  while 


the  empirical  keys  were  not.  Among  the  objectives  of  keying  a heterogene- 
ous test  should  be  Included  not  only  the  prediction  of  the  criterion  but 
also  the  Increased  understanding  characteristic  of  most  criteria.  In- 
creased knowledge  of  the  criterion  will  help  to  give  a clearer  perspective 
for  the  development  and  execution  of  a training  program  and  a clearer  pic- 
ture of  the  .actual  versus  the  probable  measures  of  success.  The  extent  to 


which  the  two  methods  of  keying  have  added  to  knowledge  of  the  criterion 
should  he  examined  critically. 

It  was  noted  how  inadequate  the  empirical  keys  were  in  criterion  def- 
inition. Only  about  one- third  of  the  items  in  the  empirical  key3  could  he 
indirectly  related  to  desirable  traits  of  superior  officers,  as  set  forth 
by  command  Judgment,  Since  the  items  ccmprising  the  empirical  keys  were 
each  equally  weighted  unity,  it  was  impossible  to  know  which  factors  to 
invoke  to  explain  the  criterion  variance  accounted  for  by  the  key. 


In  contrast  to  the  empirical  keys,  each  of  the  homogeneous  keys  were 
relatively  easy  to  define.  The  part  that  each  key  played  in  explaining 
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recession 


equation.  Coincident  with  criterion  definition,  the  test  constructor  is 
given  many  clues  as  to  how  the  multiple  correlation  may  be  increased  by 
the  addition  of  any  missing  homogeneous  tests  and  by  increasing  the  breadth 
of  the  more  relevant  scales. 


It  should  be  pointed  out,  however,  that  the  validities,  to  which  the 
discussion  of  criterion  definition  has  beer,  relevant,  ranged  from  .15  to 
.30.  The  insights  into  the  criteria  which  are  provided  by  the  keys  cannot 
be  related,  therefore,  to  more  than  2 to  9 per  cent  of  the  criterion  var- 
iance. It  must  be  concluded- that  the  greater  utility  which  is  posited  for 
the  homogeneous  keys  is  based  on  intuitive  and  not  empirical  grounds. 


v 1 f.51 1 Versus  Hcno£srisous  "p y 1 r> r»?  1 p s,  oif* 


The  last  comparison  to  be  made  between  the  homogeneous  and  empirical 
keys  is  the  manner  in  which  both  keys  fit  into  an  extended  program  of  re- 
search. Since  a good  deal  of  time  and  effort  is  usually  expended  in  order 
to  evolve  fairly  stable  keys,  the  Job  of  keying  Is  usually  carried  on  with 
the  purpose  of  long-range  use.  It  should  be  noted,  particularly  with  bio- 
graphical or  attitudinal-type  Information,  that  periodic  rc -validation  of 
the  Items  Is  essential.  Items  relating  to  socio-economic  areas,  education- 
al areas,  and  broad  attitudinal  questions  concerning  personal  adjustment 
are  just  a few  types  of  items  containing  transient  validities,  both  from 
time  to  time  and  from  group  to  group.  Anastas!  (2)  states  that  the  dis- 
tinction between  the  test  and  the  criterion  13  merely  one  of  practical 
convenience,  and  she  urges  that  every  test  score  be  operationally  defined 
in  terms  of  empirically  demonstrated  behavior.  The  literature  is  replete 
with  the  many  ways  by  which  criteria  may  be  biased  (of,  Brogden  and  Taylor 
3).  Validation  of  the  items  must,  therefore,  keep  pace  with  the  vagaries 
of  criterion  change,  and  it  is  in  this  regard  that  the  question  should  be 
asked,  "how  difficult  would  it  be  to  keep  each  set  of  keys  up  to  date?" 

Unless  only  slight  changes  occur  either  in  the  revision  cf  the  criteria 
or  in  the  inclusion  of  additional  items,  empirical  keying  would  have  to 
itart  entirely  anew.  A priori  analysis  is  usually  too  gross  to  estimate 
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accurately  hov  "align!"  the  changes  are  in  the  criterion  from  year  to  year, 
anti  which  items  are  meat  affected  by  such  changes.  In  addition,  it  is  ap- 
parent that  with  the  nproarmic-e  of  each  new  criterion,  a new  keying  pro- 
cedure would  be  required.  On  the  other  hand,  insofar  as  the  homogeneous 
keys  are  concerned,  the  entire  keying  procedure  would  have  to  be  repeated 
only  with  very  {-press  changes  in  the  test  itself.  Where  there  were  either 
revisions  of  the  criteria  or  additions  of  new  criteria,  the  same  homogene- 
ous keys  could  be  used  to  obtain  new  series  of  significant  beta  weights. 

This  procedure  involves  nothing  more  than  re-validating  each  key  on  each 
new  criterion  and  computation  of  the  multiple-regression  coefficient.  Where 
additional  homogeneous  tests  are  to  be  devised  to  measure  inadequately 
covered  areas  of  the  criterion,  the  old  homogeneous  categories  can  be  re- 
tained, and  the  statistical  labor  of  category  evoivement  and  refinement 
need  only  be  concerned  with  the  new  categories.  It  may  be  seen  clearly 
that  homogeneous  keying,  in  contrast  on  empirical  keying,  is  amenable  to 
an  expanding  and  continuous  research  program. 


SUMMARY  AND  CONCLUSIONS 


This  study  utilized  two  different  approaches  in  the  selection  and 
weighting  of  items  for  the  prediction  of  an  external  criterion.  The  first 
or  empirical  approach  has  been  and  is  today  more  commonly  used  In  the  con- 
struction of  scoring  keys.  In  this  method  the  behavior  to  be  predicted  was 
predefined  by  means  of  an  objective  criterion  external  to  the  group  of  items 
which  would  later  constitute  the  test.  The  second  or  rational  approach  de- 
veloped with  the  lack  of  suitable  external  criteria.  It  was  noted  that 
even  though  suitable  criteria  were  nonexistent,  certain  rational  hypotheses 
about  the  behavior  to  be  predicted  might  be  agreed  upon  by  experts,  and 
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then  be  determined  by  the  extent  to  which  it  measured  the  behavioral  com- 
plex that  the  entire  test  measured. 


It  is  apparent  that.  In  contrast  to  the  empirical  method,  the  selec- 
tion of  items  on  the  basis  of  internal  consistency  would  result  in  a test 
of  narrow  significance  in  relation  to  the  criterion,  especially  in  the  case 
where  the  behavior  to  be  predicted  was  itself  poorly  defined.  Realizing 
this  inadequacy,  test  makers  then  resorted  to  the  use  of  unrelated  groups 
of  rational  hypotheses  and  the  consequent  construction  of  multiple  tests, 
each  of  which  was  to  represent  a portion  of  the  criterion  complex,, 

In  this  study  the  latter  approach  has  been  semewhat  departed  from  in- 
asmuch as  the  study  wa3  restricted  to  the  use  of  the  previously  constructed 
Biographical  Inventory  UE6o8C.  This  inventory  grew  out  of  the  compilation 
of  the  most  valid  items  of  previous  inventories  plus  additionally  edited 
items,  and  it  was  used  experimentally  by  the  United  States  Air  Force.  Even 
though  the  inventory  was  not  developed  In  accordance  with  predetermined 
rational  hypotheses,  fortunately,  it  was  later  shown  that  the  items  could 


51 


be  ana ly sod  into  meaningful  subgroups.  It  van  thus  possible  to  analyze  the 
Biographical  Inventory  with  both  the  rational  and  empirical  approach.  The 
study  van  des ign-'d  in  order  to  be  able  to  key  this  heterogeneous  assortment 
of  biogra.uhic.al  items  nya tematlcally  by  the  tvo  Independent  methods  and  to 
i'ollov  with  a statistical  and  psychological  comparison,  including  the  valid 
ation  of  the  r;  tional  or  homogeneous  key 3 and  a cross-validation  of  both 
sots  of  keys  on  a subsequent  sample'. 

The  sample  with  which  both,  keys  were  validated  and  cross-validated  was 
officer  candidates  in  the  Air  Force,  Since  there  was  an  insufficient  num- 
ber of  officer  candidates  who  had  been  administered  CE608C  and  for  whom  cri 
ter ion  grades  were  available,  the  home gene o us  keys  were  developed  on  a sam- 
ple of  1000  basi  - airmen  from  the  airman  population. 

The  homogeneous  keys  were  derived  by  the  method  of  maximizing  test 
saturation.  This  method  basically  maximizes  the  item  contribution  of  com- 
mon factor  variance  to  the  total  variance  of  the  test.  Out  of  l83  items 
available  for  keying.  111,  or  58  per  cent,  were  used  to  evolve  12  fairly 
independent  homogeneous  categories  (average  r = . Ip).  Seven  items  unused 
in  the  independent  categories  plus  eight  items  which  were  'used  in  the  in- 
dependent categories  were  combined  to  form  a thirteenth  category.  This 
category  correlated  high  wits  one -ha If  of  the  independent  categories  and 
thus  tended  to  b<=  a general  factor.  • — - 

By  the  Gleser-BuBois  method  for  maximizing  test  validity  four  empiri- 
cal keys  were  developed  on  the  four  criteria:  final  grade,  military  grade, 

academic  grade}  and  pass/fall.  The  keys  were  ccmposed  of  39,  ^0,  39,  and 
19  items  or  20,  21,  20,  and  10  per  cent,  respectively,  of  183  items  avail- 
able. 

The  empirical  keys  yielded  four  correlations  with  the  criterion  for 
the  sample  on  which  they  were  constructed,  ranging  from '.43  to  .58.  Valid- 
ation of  the  homogeneous  keys  on  the  oame  sample  resulted  in  four  multiple 
correlations  ranging  from  .28  to  .35.  The  independence  of  the  homogeneous 
keys,  excluding  the  general  factor,  held  up  in  this  sample  since  the  aver- 
age Intercorrelation  increased  less  than  .01. 

The  cross-validation  of  both  sets  of  keys  cn  an  external  sample  re- 
sulted in  considerable  shrinkage  which  may  have  teen  caused  by  criterion 
Instability,  or  by  the  capitalization  on  chance  error  in  the  first  sample. 
Th»  cross-validity  coefficients  ranged  from  .17  to  .30  for  the  empirical 
keys  and  from  .15  to  .26  for  the  homogeneous  keys. 

On  the  basis  of  the  sontistical  and  psychological  comparisons  made  be- 
tween the  two  sets  of  keys,  the  following  conclusions  are  drawn: 

1.  While  few  homogeneous  key  validities  were  significant,  the 
multiple  correlations  of  the  optimally  weighted  keys  against  each  criterion 
were  highly  significant.  This  was  caused  by  the  fact  that  the  valid  vari- 
ance of  the  individual  keys  was  fairly  specific. 


2.  The  independent  hemogcnoous  keys  accounted  for  most  of  the 
valid  variance  in  each  multiple  correlation;  therefore,  the  homogeneous 
key  resembling  a general  factor  added  negligibly  to  the  multiple. 


3.  Both  the  empirical  and  homogeneous  keys  yielded  significant 
validities. 


4. 
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of  keying  proved  superior. 


3.  Both  sets  of  keys  shoved  significant  shrinkages,  with  the 
empirical  keys  shoving  significantly  greater  shrinkage  for  all  four  cri- 
teria than  the  homogeneous  keys.  This  can  be  explained  by  the  greater  cap- 
italization on  chance  error  by  the  empirical  method. 


6.  The  homogeneous  keys  were  psychologically  meaningful  and  the 
empirical  keys  were  not.  The  former  should  therefore  provide  more  clues 
for  criterion  definition  and  revision;  however,  the  validities  of  this 
study  were  of  insufficient  magnitude  to  demonstrate  this  empirically. 


On  the  basis  of  the  above  conclusions  and  within  the  limitations  of 
this  study,  it  is  recommended  that  where  a heterogeneous  test  is  being 
keyed  on  strictly  an  empirical  basis,  the  method  should  be  evaluated  in  re- 
lation to  criterion  improvement  and  -under  a tending  as  well  as  prediction. 
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